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1  Introduction 


One  of  the  primary  tasks  of  a  computer  vision  system  is  to  reconstruct,  from  two- 
dimensional  images,  such  three-dimensional  properties  of  a  scene  as  the  shape,  motion, 
and  spatial  arrangement  of  objects.  In  monocular  vision,  an  important  goal  is  to  recover, 
from  time-varying  images,  the  relative  motion  between  a  viewer  and  the  environment,  as 
well  as  the  so-called  structure  of  the  environment.  The  structure  of  the  environment  is 
usually  taken  to  be  collection  of  the  relative  distances  of  points  on  the  surfaces  in  the 
scene  from  the  viewer.  In  theory  at  least,  absolute  distances  can  be  determined  from  the 
image  data  if  the  motion  is  known. 

Three  types  of  approaches,  discrete,  differential,  and  least-squares  have  been  pursued 
in  most  of  the  earlier  work  in  motion  vision.  Discrete  methods  establish  correspondences 
between  images  of  a  point  in  the  scene  in  a  sequence  of  images  in  order  to  recover  motion 
(see  for  example,  Prazdny  [I979j,  Roach  &  Aggarwal  [1980],  Longuet-Higgins  [1981], 
Barnard  &  Thompson  [1980],  Mitiche  [1984],  Tsai  &  Huang  [1984]).  In  the  differential 
approach,  the  optical  flow,  an  estimate  of  the  velocity  of  the  image  of  a  point  in  the 
scene,  as  well  as  the  first  and  second  partial  derivatives  of  the  optical  flow,  are  used  to 
determine  motion  and  the  local  structure  of  the  surface  of  the  scene  (see  Longuet-Higgins 
&  Prazdny  [1980],  Waxman  &  Ullman  [1983]).  In  the  least-squares  approach,  motion 
parameters  are  found  that  are  most  consistent  with  the  optical  flow  over  the  entire  image 
(see  Ballard  and  Kimball  [1981],  Bruss  &  Horn  [1983],  Adiv  [1985]). 

Amongst  the  shortcomings  of  the  discrete  methods  are  that  they  require  the  solution 
of  point  correspondence  problems  and  that  they  arc  not  very  robust,  since  information 
from  a  small  portion  of  the  image  is  used.  To  overcome  the  first  problem,  methods  have 
been  suggested  that  only  require  line  or  contour  correspondence  (see  for  example,  Tsai 
[1983],  Yen  &  Huang  [1983],  and  Aloimonos  &  Basu  [1986]);  however,  the  computation  is 
still  based  on  information  in  a  relatively  small  portion  of  the  image.  Differential  methods 
exploit  only  local  information  and,  therefore,  are  sensitive  to  inherent  ambiguities  in 
the  solution  when  data  is  noisy.  In  fact,  since  these  methods  essentially  work  with  a 
vanishingly  small  field  of  view,  they  are  unable  to  estimate  all  components  of  the  motion 
(Horn  &  Weldon  [1986]).  Methods  based  on  the  least-squares  approach  are  more  robust, 
however,  they  make  use  of  the  unrealistic  assumption  that  the  computed  optical  flow  is 
a  good  estimate  of  the  true  motion  field.  Also,  the  iterative  algorithms  for  estimating  an 
optical  flow  field  are  computationally  expensive.  This  motivates  investigation  of  methods 
that  directly  use  brightness  derivative  information  at  every  image  point.  Several  special  y°r 
cases  of  the  motion  vision  problem  have  already  been  addressed  using  this  notion.  1 

Negahdaripour  [1986]  investigates  the  problem  of  recovering  motion  direclly  from  the  , 
time-varying  image.  He  shows  that  the  solution  can  be  determined  easily  in  certain 
special  cases.  For  example,  when  the  motion  is  purely  rotational,  one  only  has  to  solve 
three  linear  equations  in  three  unknowns  (Aloimonos  &  Brown  [1984]  apparently  first 
reported  a  solution  to  this  problem,  followed  by  Horn  &  Weldon  [1986],  who  also  studied 
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its  robustness).  Another  special  case  of  interest  is  the  one  where  the  depth  values  of 
some  points  are  known.  The  depth  values  at  six  image  points  are  sufficient  to  recover 
the  translational  and  rotational  motion  from  six  linear  equations.  In  practice,  to  reduce 
the  influence  of  measurement  errors,  the  information  from  as  many  image  points  as 
possible  should  be  used.  If  the  variation  in  depth  is  negligible  in  comparison  to  the 
absolute  distance  of  points  on  the  surface,  it  can  be  assumed  that  the  points  are  located 
at  essentially  at  the  same  distance  from  the  viewer,  that  is,  the  scene  lies  in  a  frontal 
plane.  In  this  case,  Negahdaripour  [  1 986]  shows  that  the  six  translational  and  rotational 
motion  parameters  can  also  be  obtained  from  six  linear  equations. 

When  the  scene  is  planar  (but  not  necessarily  a  frontal  plane)  the  results  of  the  least- 
squares  analysis  of  Negahdaripour  &  Horn  [1987]  can  be  applied.  This  approach  leads  to 
both  iterative  and  closed-form  solutions.  Negahdaripour  [1986]  further  presents  iterative 
and  closed-form  solutions  for  quadratic  surfaces.  Through  examples  using  synthetic  data, 
he  shows  that  the  iterative  method  gives  a  better  estimate  than  the  analytical  one  in  the 
case  of  quadratic  surface,  and  that  it  is  not  as  robust  as  the  method  that  applies  in  the 
case  of  planar  surfaces.  He  also  addresses  the  lack  of  robustness  of  certain  analytical 
methods  published  in  the  computer  vision  literature  for  recovering  motion,  and  explains 
why  the  iterative  method  of  Negahdaripour  &  Horn  [1987]  for  planar  surfaces  happens  to 
give  the  same  estimate  as  the  analytical  method.  Finally,  Horn  &:  Weldon  [1986]  give  a 
treatment  of  several  direct  methods  when  the  motion  it  is  purely  translational  or  purely 
rotational. 

In  this  paper,  we  present  a  direct  method  for  recovering  the  motion  of  a  viewer  without 
making  any  assumptions  about  the  shapes  of  the  surfaces  in  the  scene.  We  only  impose 
a  simple  physical  constraint:  Depth  must  be  positive.  That  is,  a  point  on  a  surface  must 
be  in  front  of  the  viewer  in  order  for  it  to  be  imaged.  Unfortunately,  the  problem  is 
still  nitne  difficult  to  solve  when  motion  consists  of  both  translation  and  rotation  of 
the  viewer.  We  therefore  first  address  the  problem  of  a  translating  observer  and  present 
two  examples.  We  then  explain  how  our  method  can  be  extended  if  the  motion  involves 
rotation  as  well  as  translation  of  the  viewer.  The  general  method  requires  considerably 
more  computation  than  the  special  one,  and  the  solution  may  not  be  unique  given  noisy 
data.  This  is  because  of  the  inherent  difficulty  in  distinguishing  between  rotation  about 
some  axis  parallel  to  the  image  plane  and  translation  along  an  axis  that  is  perpendicular 
to  tins  rotational  axis  (Jerian  &  Jain  [1983]).  This  problem  is  most  apparent  when  the 
n<  id  of  vi'  w  is  small  (Horn  &  Weldon  [1986]).  We  demonstrate  some  of  these  problems 
lo  means  of  an  example. 

2  Brightness  Change  Constraint  Equation 

\  viewer-centered  coordinate  system  is  chosen,  the  image  is  formed  on  a  ol.ne  perpendic¬ 
ular  to  the  viewing  direction  (which  is  along  the  2-axis),  and  the  focal  ■■  mill  is  assumed 
to  hi  i i  by.  without  loss  of  generality  (Figure  1).  Let  R  (.V,  V,  Z)1  a  point  in  the 
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Figure  1.  Viewer-centered  coordinate  system  and  perscpective  projection. 

scene  that  projects  onto  the  point  r  =  (x,y,  l)7"  in  the  image.  Assuming  perspective 
projection,  we  have 

r  =  jTiR- 

where  Z  -  R  ■  z  is  the  distance  of  the  point  R  from  the  viewer,  measured  along  the 
optical  ax  s.  This  is  referred  to  as  the  depth  of  the  point. 

Now,  suppose  the  viewer  moves  with  translational  and  rotational  velocities  t  and  w 
relative  to  a  stationary  scene.  Then  a  points  in  the  scene  appears  to  move  with  respect 
to  the  viewer  with  velocity 

R{  =  — R  xu-t. 

The  corresponding  point  in  the  image  moves  with  velocity  (Negahdaripour  &  Horn  [1987]) 


r,=  -(zx(rx(rxW- 


The  velocities  of  all  image  points,  given  by  the  above  equation,  taken  collectively,  define 
a  two-dimensional  vector  field  that  we  call  the  image  motion  field.  This  has  also  at  times 
been  refer  ed  to  as  the  optical  flow  field  (see  Horn  [1986]  for  a  discussion  of  the  distinction 
between  optical  flow  and  the  motion  field). 

The  brightness  of  the  image  of  a  patch  on  the  surface  of  some  object  may  change 
for  a  number  of  different  reasons  including  changes  in  illumination  or  shading.  Image 
brightness  changes  will,  however,  be  dominated  by  the  effects  of  the  relative  motion  of 
the  scene  and  the  observer  provided  that  the  surfaces  of  the  objects  have  sufficient  texture 
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and  the  lighting  conditions  vary  slowly  enough  both  spatially  and  with  time.  In  this  case, 
brightness  changes  due  to  changing  surface  orientation  and  changing  illumination  can  be 
neglected  and  we  may  assume  that  the  brightness  of  a  small  patch  on  a  surface  in  the 
scene  remains  essentially  constant  as  it  moves.  Let  E(r,t)  denote  the  brightness  of  an 
image  point  r  at  time  t.  Then  the  constant  brightness  assumption  allows  us  to  write 

t)  =  Er  ■  rt  +  Et  —  0, 
at 

where  FJt  and  ET  =  (Ex,Ey,  0)T  denote  the  temporal  and  spatial  derivatives  of  brightness 
respectively. 

If  we  substitute  the  formula  for  the  motion  field  into  this  equation  we  arrive  at  the 
brightness  change  constraint  equation  for  the  case  of  rigid  body  motion  (Negahdaripour 
&  Horn  [1987]), 

Et  +  v  ■  u  +  - — r  s  •  t  =  0, 

It  •  z 

where,  for  conciseness,  we  have  defined 

s  =  ( Er  x  z)  x  r  and  v  =  r  x  s. 

In  component  form,  s  and  v  are  given  by 


xi Ex  +  yEy 


(  xyEx  +  (y2  +  1  )Ey 
and  v  =  I  -(x2  +  1  )EX  -  xyEy 

V  yEz  ~  xEy 


A  useful  immediately  consequence  of  the  way  the  vectors  r,  s,  and  v  are  defined  is  that 
they  form  an  orthogonal  triad,  that  is 

r  •  s  =  0,  r  •  v  =  0,  and  s  ■  v  —  0. 


Note  that,  the  brightness  change  constraint  equation  is  not  altered  if  we  scale  both  Z  = 
R  /,  and  t  by  the  same  factor,  k  say.  We  conclude  that  we  can  determine  only  the 
direction  of  translation  and  the  relative  depth  of  points  in  the  scene;  this  well-known 
ambiguity  is  here  referred  to  as  the  scale-factor  ambiguity  of  motion  vision. 

The-  brightness  change  constraint  equation  shows  how  the  motion  of  the  observer, 
{lj.  t},  and  the  depth  of  a  point  in  the  scene,  Z,  impose  a  constraint  on  the  spatial 
.mi  1  temporal  derivatives  of  the  image  brightness  corresponding  to  a  point  in  the  scene. 
I  nl’ort  unately,  we  cannot  recover  both  depth  and  motion  using  this  constraint  equation 
ah'ne.  To  show  this,  we  solve  the  constraint  equation  for  Z,  in  terms  of  the  true  motion 
parameters  {u/,  t.},  to  obtain 


c  t  v  u 


Xow,  for  an  arbitrary  motion  {u>\  t'},  depth  values  that  satisfy  the  brightness  change 
ronstraint  equal  ion  can  be  determined  using 

y,  _  S  •  t* 

C  +  V  •  LJ1' 

(provided  that  the  denominator  is  not,  zero).  This  may  suggest  that,  for  any  choice  of  the 
pair  {uj\  t'),  we  can  determine  depth  values  such  that  the  brightness  change  equation  is 
satisfied  at  every  image  point.  Clearly  an  infinitely  number  of  solutions  is  possible  siuo 
the  motion  parameters  can  be  chosen  arbitrarily. 

3  Positiveness  of  Depth 

The  depth  values  of  points  on  the  visible  portions  of  a  surface  in  the  scene  are  c.  r'  T"d 

he  positive;  that  is,  only  points  in  front  of  the  viewer  are  imaged.  In  theory,  any  r’r>-; 

pair  {<*/', t'}  that  gives  rise  to  negative  depth  values  cannot  be  the  correct  one.  j’T:s, 
oroblerr.  is  to  determine  the  pair  {u>,  t}  that  gives  rise  to  positive  depth  values  [Z  '>  0) 
ov'r  the  whole  image.  One  may  well  ask  whether  there  is  a  unique  solution;  that  is,  given 
that  the  brightness  change  equation  is  satisfied  for  the  motion  (w, t}  and  t!  e  surface 
Z  :■  0,  is  there  another  motion  and  another  surface  Z'  >  0  that  satisfies  the 

brightness  change  equation  at  every  point  in  the  image?  In  general,  this  is  possible 
Cure,  for  example,  an  image  of  uniform  brightness  could  correspond  to  an  arbitrary 
..  'fife-;  m  surface  moving  in  an  arbitrary  way.  Hence,  the  brightness  gradients  (or  lack  of 
brightness  gradients)  can  conspire  to  make  the  problem  highly  ambiguous.  It,  practice, 
given  a  sulficiontly  textured  scene,  it  is  more  likely  that  we  have  the  opposite  problem: 
i  ;s  n,-.  solution  because  of  noise  in  the  images  and  the  error  in  estimating  brightness 
derivatives;  that  is,  every  possible  set  of  motion  parameters,  including  the  correct  ones, 
b-ad  to  some  negative  depth  values.  So  we  have  to  invent  a  method  for  selecting  a  solution 
■hat  :  om.es  closest  to  being  consistent  with  the  image  data. 

The  problem  is  rather  difficult  when  both  rotation  and  translation  are  unknown. 
.  1  :■  we  first  restrict  attention  to  the  special  case  when  either  rotation  is  zero  or 

-  ,i i  least  known.  We  then  show  how  the  procedure  may  bo  extended  to  doa  with  the 
general  case. 

d  5'uro  Translation  or  Known  Rotation 

-  ••.<•  *i  o  rotational  component  of  motion  is  known.  Then  we  can  write  the  ■rig’'  i  nc'-'s 
'■!.,! ope  eqrat mn  in  tin  iorm 

?  4  !(«,  .  t)  -  o, 

vvlmre  r  r  *  v  ■  ..o.  For  simplicity,  we  will  from  now  on  write  <*  where  ?  shotrid  appear. 
1  lie  prohl*  m  is  still  under-constrained  if  we  restrict  ourselves  to  the  brightness  change 
constraint  equation  alone.  At  each  point,  we  have  one  constraint  equation,  (riven  n 
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image  points  we  have  therefore  n  constraint  equations,  but  n  +  2  unknowns  (n  depth 
values  and  two  independent  parameters  required  to  specify  the  direction  of  translation). 
Most  of  these  “solutions,”  however,  are  inconsistent  with  the  physical  constraint  that 
Z  >  0  for  every  point  on  the  visible  parts  of  the  surfaces  imaged.  If  we  impose  this 
additional  constraint  we  may  have  many,  only  one,  or  no  solution  depending  on  the 
variety  of  brightness  gradient  directions  in  the  image  and  the  amount  of  noise  in  the 
data,  as  mentioned  earlier.  Note  that  we  need  to  use  constraint  from  a  whole  image 
region  since  the  problem  remains  underconstrained  if  we  restrict  ourselves  to  information 
from  a  small  number  of  points  or  a  line. 

Before  we  discuss  the  general  method,  we  show  how  a  simplified  constraint  can  be 
used  to  recover  motion  provided  that  so-called  stationary  points  can  be  identified.  We 
then  present  a  more  general  procedure  for  locating  the  focus  of  expansion  (FOE)  and 
consequently  the  direction  of  motion. 

4.1  Stationary  Points 

An  image  point  where  c  =  0  will  be  referred  to  as  a  stationary  point  (Horn  &:  Weldon 
:  1986  ).  In  the  case  of  pure  translation,  (w  =  0),  a  stationary  point  is  one  where  the  time 
derivative  of  brightness,  Et,  is  zero.  In  order  to  exclude  regions  of  uniform  brightness  from 
consideration,  we  restrict  attention  to  points  with  non-zero  brightness  gradient  ( Er  f  0). 
When  c  -  0,  the  brightness  change  equation  reduces  to 

j  (s  •  t)  =  0, 

and,  if  the  depth  is  finite,  this  immediately  implies  that 

(s  •  t)  =  0. 

(We  assume  a  finite  depth  range  here — background  regions  at  essentially  infinite  depth 
have  to  be  detected  and  removed — see  Horn  &  Weldon  [1986].)  Since  Z  drops  out  of  the 
equation,  we  conclude  that  the  depth  value  cannot  be  computed  at  a  stationary  point. 
These  points,  however,  do  provide  strong  constraints  on  the  location  of  the  FOE. 

In  fact,  with  perfect  data,  just  two  non-parallel  vectors  8]  and  s2,  at  two  stationary 
points,  provide  enough  information  to  recover  the  translational  vector  t.  We  note  that  t 
is  perpr  ndicular  to  both  8j  and  s2  and  so  must  be  parallel  to  the  cross-product  of  these 
t  Vv i *  \  ectors.  That  is, 

t  =  A:  (si  x  s2), 

1 1 1  n  k  i>  some  constant  that  cannot  be  determined  from  the  image  brightness  gradients 
.done  because  of  the  scale-factor  ambiguity. 

l  itis  approach  can  be  interpreted  directly  in  terms  of  quantities  in  the  image  plane: 
i  he  (.tightness  gradient  at  a  stationary  point  is  orthogon.d  to  the  direction  to  the  FOE, 


i  rw  t  mi*  fwi  *r.  fv~\  nn  HTCHTOTW 


or,  equivalently,  the  tangent  of  the  iso- brightness  contour  at  a  stationary  point  passes 
through  the  FOE.  Intersecting  the  tangents  of  the  iso-brightness  contours  at  two  different 
stationary  points  allows  us  to  determine  the  FOE  (see  the  appendix  for  more  details). 

In  practice  it  will  be  better  to  apply  least-squares  techniques  to  information  from 
many  stationary  points.  Because  of  noise  in  the  images,  as  well  as  quantization  error, 
the  constraint  equation  (s  ■  t  =  0)  will  not  be  satisfied  exactly.  This  suggests  minimizing 
the  sum  of  the  squares  of  the  errors  at  every  stationary  point;  that  is,  we  minimize 


x>-*)2 = tr(£>r)t. 


(In  the  above  we  have  used  the  identity  s  •  t  =  BTt.)  Note  that  the  resulting  quadratic 
form  can  not  be  negative. 

Because  of  the  scale-factor  ambiguity  we  can  only  determine  the  direction  of  t,  not 
its  magnitude,  so  we  have  to  impose  the  constraint  ||t||  =  1  (otherwise  we  immediately 
get  the  trivia!  solution  t  =  0).  This  leads  to  a  constrained  optimization  problem.  We  can 
create  an  equivalent  unconstrained  optimization  problem,  with  a  closed-form  solution,  by 
introducing  a  Lagrange  multiplier.  We  find  that  we  now  have  to  minimize 


J  ~  s,sf  +  A(l  -  trt). 


The  necessary  conditions  for  stationary  values  of  J  are 

dj  .  dJ 

-  =  0  and  -  =  0. 

Executing  the  indicated  differentiations  we  arrive  at 


(X8‘s0 


t  =  At  and  t*  t  —  1. 


This  is  an  eigenvalue-eigenvector  problem;  that  is,  {t ,  A }  is  an  eigenvector-eigenvalue  pair 
of  the  3  x  3  matrix 

t-i 

This  real  symmetric  matrix  generally  will  have  three  eigenvalues  and  these  eigenvalues  will 
be  non-negative  since  the  quadratic  form  we  started  off  with  was  non-negative  definite. 
It  is  easy  to  see  that  J  is  minimized  by  the  eigenvector  associated  with  the  smallest 
eigenvalue,  since  substitution  of  the  solution  yields 

J  -  tT(At)  f  A(1  tTt)  \tTt  f  A  -  A tTt  ~  A. 


*Vjt  '-■+  "jm  *_A l*  M  '  *  J 


,*.  J'  s  ✓ 

'  •  *.  *.  _\  V- 
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It  should  be  noted  that  with  just  two  stationary  points,  the  3x3  matrix  has  rank 
two  since  it  is  the  sum  of  two  dyadic  products.  The  solution  then  is  the  eigenvector 
corresponding  to  the  zero  eigenvalue.  Geometrically,  this  is  the  vector  normal  to  the 
plane  formed  by  si  and  s^,  as  discussed  earlier.  By  the  way,  if  t  is  an  eigenvector,  so  is 
t.  While  these  two  possibilities  correspond  to  the  same  FOE,  it  may  be  desirable  to 
distinguish  between  them.  This  can  be  done  by  choosing  the  one  that  makes  most  depth 
values  positive  rather  than  negative  (see  Horn  &  Weldon  [1986'). 

The  least-squares  method  just  described  can  be  interpreted  in  terms  of  quantities  in 
ihe  image  plane  also.  At  each  stationary  point,  the  tangent  to  the  iso-brightness  contour 
provides  us  with  a  line  on  which  the  FOE  would  lie  if  there  was  no  measurement  error. 
In  practice  these  lines  wiil  not  intersect  in  a  common  point  due  to  noise.  The  position  of 
the  FOE  may  then  be  estimated  by  finding  the  point  with  the  minimum  (weighted)  sum 
of  squares  of  distances  from  the  lines  (see  the  appendix  for  more  details). 

4.2  Constraints  Imposed  by  Brightness  Gradient  Vectors 

We  first  assume  that  two  translational  motions  and  two  surfaces  satisfy  the  brightness 
change  equation;  that  is,  we  have 

c  +  -^-(s  ■  t)  =  0  and  c  +  ~(s  ■  t')  =  0. 

Zi  a' 

(lore,  {/  >  0,  t}  denotes  the  true  solution  and  {Z1  >  0,  t'}  denotes  a  spurious  (or 
assumed)  solution.  We  will  show  that  we  must  have  Z  =  kZ'  and  t'=kt\  for  some 
non-zero  constant  k,  provided  that  there  is  sufficient  texture  and  that  we  consider  a  large 
enough  region  of  the  image.  This  means  that  the  solution  is  unique  up  to  the  scale-factor 

ambiguity. 

Solving  for  Z  and  Z'  we  obtain 

Z  — (s  ■  t)  and  Z'  -  -(s  ■  tf). 

c  c 

Tin  depth  value  cannot  be  computed  at  a  point  where  c  =  0;  that  is,  at  a  stationary 
poii,;..  We  already  know  how  to  exploit  the  information  at  these  points  and  so  exclude 
them  from  further  consideration,  that  is,  we  assume  from  now  on  that  c  /  0. 

hi  nee  /  is  the  true  solution,  we  are  guaranteed  that  Z  >  0.  If  {Z\  t / }  is  to  be  an 
,k  *  "pt  a  Me  solution,  we  must  also  have  Zf  •  0  and  so 

ZZ'  -V(s  •  t)(s  -  t')  >  0. 

vo -v  ■  i  i  focus  of  expansion  (FOE)  is  the  intersection  of  the  translational  velocity  vector 
t  and  the  image  plane  z  =  1.  It  lies  at 

t 

t 

f  7. 


provided  that  t'Z/O  (otherwise,  it  is  at  infinity  in  the  direction  given  by  trie  ve<  '<■  r  f 
We  can  similarly  write 


t‘  - 


t' 


t'  •  z’ 


for  the  focus  of  expansion  corresponding  to  the  assumed  translational  velocity  t'  (provide,: 
again  that  t'-z^O).  We  can  write  s  =  ( Er  x  z)  x  r  in  the  form 


s  =  (r-  Er)z  (r-  f.)Er, 
and,  noting  that,  r  •  z  =  1,  we  obtain 

s  =  (r  •  Er)z  -  Er. 

Therefore,  we  have 


that  is. 

Similarly,  we  obtain 


8  •  t  =  (r  •  Er)(t  •  z)  -  Et  ■  t, 
s  •  t  =  (t.  •  z)  ((r  -  t)  •  Et ). 


s  •  t'  =  (t'  •  z)  ((r  -  t')  ■  Er). 

Substituting  these  into  the  inequality  ZZ'  >  0  we  arrive  at 

(t  ■  z)(t'  •  z)  ((r  -  t)  •  Er)  ((r  --  t')  ■  Er)  >  0. 
If  (t  ■  z)  and  (t'  •  z)  have  the  same  sign,  we  must  have 

((r  —  t")  •  Er)((r  -  t')  •  Er)  >0. 


For  convenience,  we  denote  the  term  on  the  left-hand  side  of  the  inequality  p  from  hero 
on.  So  for  Z Z'  ^  0  we  must  have  p  >  0.  (Note  that  if  (t  •  z)  and  (t'  •  z)  have  opposite 
signs,  the  inequality  is  reversed.)  Without  loss  of  generality,  we  assume  from  now  on  that 
the  above  constraint  holds  the  proof  is  similar  in  the  opposite  rase,  as  we  will  indicate. 

For  t '  to  be  a  possible  translational  motion,  the  inequality  developed  above  must  hold 
hr  every  point  r  in  the  image  region  under  consideration,  that  is,  p  ">  0.  At  each  point, 
!,'r  is  constrained  to  lie  in  a  direction  that  guarantees  that  ((r  t )  ■  Er)  and  ((r  t)  ■  Er) 
liavc  the  -ame  sigi.  In  practice,  a  sufficiently  large  image  region  will  contain  some 
image  brightness  gradients  that  violate  this  constraint  unless  t  t'.  We  will  estimate 
the  probability  that  an  arbitrarily-chosen  brightness  gradient  will  violate  this  constraint. 
This  probability  varies  spatially  and  we  show  that  there  is  a  line  segment  in  *  he  image 
.dong  which  the  probability  of  violating  the  constraint  becomes  one.  Furthermore  we 
exploit,  the  dist  ribut  ion  in  the  i  mage  of  places  where  Z'  *  0  to  obtain  an  * :  mat  e  of  the 
true  FOK. 
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1  Permissible  and  Forbidden  Ranges 

* 

x  :  t  r'  and  x'  t'  rj. 

x  ami  x  rcjirrM  i,!  the  hue  segments  from  a  point  P  in  the  image,  with 
7 

■  ■  o  ■  •  ~  r  tj-.y.l)  .  to  the  true  are!  spurious  f  OKs,  respectively.  These  are  the 
'  g  :  <  i; ? s  It  and  P t  '  in  h  igure  2a  Note  that  the  scalar  product  (x  •  Et)  is  positive 
■  ■■  ..g!e  tx-tween  x  arid  the  brightness  gradient  vector  at  point  P  is  less  than  7r/2, 

negative  when  the  angle  is  greater  than  jt  2.  It  is  zero  when  x  is  orthogonal 
•  e  •;  v ef  tor  Similarly,  the  dot  product  (x'  •  Er)  is  positive,  negative,  or  zero 
'  a;  go  between  x‘  and  the  gradient  vector  at  point  P  is  less  than,  greater  than 

'll  a  .-  from  the  discussion  in  the  previous  section,  the  constraint  p  >  0  or, 

(x  ■  t;r)(x'  t;T)  -0 

,■  •'  t ; . ■  i '  as  assumed,  (t  ■  z)  and  (t1  ■  /,)  have  the  same  sign).  Now  suppose  that 
' '  1 "  "  direct  ions  in  t  he  image  plane  ort  hogonal  to  the  vectors  x  and  x'  as  follows 

1  ig  ;re  2b): 

p  x  ■  7,  and  p'  x'  •  z. 


■  M  p  gives  the  direction  of  a  line  that  divides  the  possible  directions  of  E,  into 
ruiges  with  differing  signs  for  (x  ■  Er ) .  Similarly,  the  vector  p'  gives  the  direction  of 
e  that  divides  the  possible  directions  of  Er  into  two  ranges  with  differing  signs  for 


Figure  3.  Permissible  and  forbidden  ranges  for  the  brightness  gradient  direction. 

Unless  p  happens  to  be  parallel  to  p',  we  can  express  an  arbitrary  gradient  vector  ET 
in  the  form 

Er  =  a  p  +  Pp\ 

for  some  constants  a  and  3-  Then 

(x  •  Er)(x'  ■  Er)  =  -a/3  |Jx  x  x'||2  . 

We  see  that  the  product  denoted  p  is  positive  when  Er  lies  between  p  and  -p'  (a  >  0 
and  3  <  0)  and  when  E r  lies  between  —  p  and  p'  (a  <  0  and  3  >  0).  The  union  of  these 
two  ranges  is  called  the  permissible  range  for  Er  since  it  leads  to  positive  depth  values. 
Conversely,  the  product  will  be  negative  when  Er  lies  between  p  and  p'  (a  >  0  and 
3  >  0)  and  when  Er  lines  between  — p  and  - p '  (a  <  0  and  3  <  0)-  The  union  of  th  e 
two  ranges  is  called  the  forbidden  range  for  Er  since  it  leads  to  negative  depth  values. 

Denoting  the  half-planes  separated  by  the  line  parallel  to  p  by  77  4  and  II  ~  and  those 
separated  by  the  line  parallel  to  p'  by  77,+  and  H'~ ,  we  define  regions  R\,  ...  ,  as 
follows: 

R\  // 4  r i  Hn ,  Rj^irnli'  , 

and 

III  r--  ir  n  Ra  =  H  -  n  //'+. 

We  see  that  R\  u  R; j  is  the  permissible  range  for  Er,  because  Er  has  to  lie  in  this 
region  in  order  to  satisfy  the  constraint  73  >  0.  Conversely,  the  region  Ri  U  R 4  is  the 
forbidden  range  for  Er  since  Z'  <  0  when  Er  lies  in  this  region  (see  Figure  3).  (Note 
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that  the  permissible  range  will  be  the  region  consisting  of  R>  and  R4,  and  the  forbidden 
range  will  consist  of  R\  and  R 3,  when  (t  i)(t'  z)  ■  0.) 

We  now  show  that  if  t  /  t\  then  the  vector  Fr  has  to  lie  in  the  forbidden  region  (and. 
therefore,  Z'  •;  0)  for  some  image  points.  Therefore,  we  must  have  t  t'  to  guarantee 
that  Z'  ■  0  for  every  image  point.  In  this  case,  /  kZ'  for  some  non-zero  constant  k. 
This  implies  that 

1  l  , 

- - 1  —  t' 

t  z  t'  /, 

or  t  kt  .  Since  this  means  that  we  can  recover  the  translational  motion  up  to  a  scale 
factor,  we  conclude  that  the  solution  is  unique  lq  to  the  scale-factor  ambiguity. 


1.2.2  Distribution  of  Points  Violating  the  Inequality  Constraint 

''up pose  t  ow  that  the  point  R  lies  along  the  line  passing  through  F  and  F' ,  which  we 
<-<-fer  :u  as  a  FOE  constraint  line.  Then  we  have 

r  (1  y)t  •  t\ 

for  -on. We  see  that  0  ■  ->  •  I  when  the  point  R  lies  on  the  segment  between  the 

p  !  and  F'  Also  ■)  *  0.  if  P  lies  on  the  ray  emanating  from  F  (segment  FX)  and 

1.  it  l‘  it  lies  on  the  ray  emanating  from  F'  (segment  F'X').  For  points  on  the  FOK 
1  r.i : ; . '  1  i  tie,  we  have 

x  t  r  p  ( t  t')  and  x'  *'  r  ( ->  lift  t ' ) . 

.  product  of  interest  to  us  here,  p,  is  then  given  by 

(x  F;r)(x'  -Kr)  ',(-,  1)((«  t ')  Et)\ 

1'  '-,i  lt,ir  that  p  will  be  negative  when  0  •  y  *  1.  unless  the  gradient  vector  is  orthogonal 

’•1  PI’1  .note  that  FF'  is  the  vector  (t  t')).  The  point  P  is  a  stationary  point  if  the 

gradient  vector  is  orthogonal  to  the  line  PE’,  and  we  have  excluded  such  points  from 

i  o.v  idera'  01;  This  implies  that,  for  points  on  t  lie  line  segment  F  F' .  the  depth  values 
/'  .me  g  i  iranteed  to  be  negative  (unless  the  point  fiappens  to  be  a  stationary  point) 
i  to-  product  p  is  positive  when  -y  «-  0  or  -y  -1  So  in  this  ease  the  depth  values  are 
t  Mr.m’et  I  to  be  positive  for  points  along  the  rays  P  \  and  b"  X'.  unless  the  point  is  ,1 
•'  d  •oti.irv  point..  ('I'he  situation  is  reversed  when  it  y.ijt'  z  1  (I,  i  1 1 .  posdr.i  Oepin 

values  along  p  F'  and  negative  ones  along  t  lie  rav  s  P  \  and  /•  '  \  '  ] 

\  p  rot  .alii;  it  y  value  can  be  assigned  to  eai  ti  i  mage  point  .is  a  measure  ot  the  li  keli  hood 

/'  0  at  that  image  point  -nice  /'  0  it  the  gradient  m  tur  lies  outside  the 

.ms  able  range,  we  can  conclude  t hat  the  probabilif  \  dist ritmt ion  1  ■  "on  depends  on 
'E  •  he  anglr  between  the  vectors  X  and  x'.  as  well  as  on  the  distrit.ut  ‘  t  fit  brightness 
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figure  4.  Relationship  between  the  size  of  the  permissible  range  and  the  relative  position  of  an  image 
point  with  respect  to  the  FOK  constraint  line. 

W  hen  0  is  small,  the  permissible  range  for  Er  consists  of  a  large  set  of  allowed  direc- 
(see  figure  4a).  Therefore,  the  points  where  0  is  small  are  likely  to  have  positive 
depth  values  even  for  an  incorrect  translational  vector  t\  These  are  points  that  are  either 
at  some  distance  laterally  from  I  he  FOE  constraint  line  or  are  in  the  vicinity  of  the  two 
ray-,  FX  and  F'X1. 

Conversely,  when  9  is  large,  the  permissible  range  for  Er  comprises  a  small  set  of 
directions  (see  figure  -lb).  Therefore,  it  is  more  likely  that  the  brightness  gradient  lies 
outside  this  range,  giving  rise  to  negative  depth  values.  In  the  extreme  case  when  0  ~=  n 
'  fiat  is,  the  point  lies  along  F F')  the  depth  values  are  guaranteed  to  be  negative  (unless 
'  ne  point  is  a  stationary  point).  The  forbidden  range  for  a  point  on  FF'  contains  all 
possible  directions  for  Er  excluding  only  the  line  orthogonal  to  FF'. 

> oppose  that  t lie  probability  distribution  of  the  gradient  vectors  is  independent  of 
t  in  image  position  and  is  rotationally  symmetric;  that  is,  all  directions  of  the  brightness 
gradients  are  equally  likely,  ft  is  not  difficult  to  see  that  the  probability  that  a  point  in 
'be  image  plane  gives  rise  to  a  negative  depth  value  is  then  given  by 


Prob(Z'  <  0)  - 

7T 

\  chord  of  a  circle  subtends  a  constant  angle.  It  follows  that  the  constant  probability 
"  i  are  circles  that  pass  through  F  and  F\  and  that  there  i-  symmetry  about  the  FOE 
•  'iristr.imt  line  (see  Figure  .r>a). 
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This  can  be  shown  algebraically  as  follows:  Let  Q  be  the  projection  of  an  image  point 
P  on  XX',  and  let  O  be  the  midpoint  of  the  FOE  constraint  line  FF'  (see  Figure  5b). 
Further,  let  6\  be  the  angle  between  PF  and  PQ ,  while  0 2  is  the  angle  between  PF'  and 
PQ,  and  define 


Then,  we  have 


Using  the  identity 


we  arrive  at 


f=-\FF'\,  h  =  \PQ\ ,  and  s  =  \OQ\. 


f  —  s  f  +  s 

tan  =  — - —  and  tan  02  =  — - — . 


tan  0  =  tan(0i  +  62) 


tan0  = 


tan  0i  +  tan  02 
1  —  tan  0i  tan  02  ’ 

2  hf 


s2  +  h2  -  /2‘ 

The  locus  of  points  with  constant  0  (and,  equivalently,  constant  tan0)  is  thus  determined 
by  the  equation 

s2  +  h2-f2  =  2khf, 

for  some  constant  k.  This  can  be  written  in  the  form 


+  [h-  fk)2  ^  (1  +  kl)f2, 


r  2\  1 2 


which  is  the  equation  of  a  circle  centered  at  ( s,h )  =  (0,kf)  that  passes  through  (s,h)  — 
(/, 0)  and  ( s,h )  =  (-/, 0).  Solving  for  0  we  obtain 


0  =  tan  1 


2hf 

72  ,f  h2  fl' 


and,  therefore, 

Prob(^'  <  o)  = 

For  constant  s  (where  s  <  /),  this  function  has  a  maximum  of  1  for  h  =  0;  that  is,  on 
the  line  segment  FF'. 

To  summarize,  we  have  shown  that  there  are  points  in  the  image  that  give  rise  to  a 
negative  depth  value  if  an  incorrect  translation  vector  (t')  is  assumed.  These  points  are 
more  likely  to  be  found  in  the  vicinity  of  the  line  segment  that  connects  the  incorrect 
focus  of  expansion  to  the  true  one  (later  this  is  exploited  to  locate  the  true  focus  of 
expansion).  As  F'  approaches  F,  the  region  around  F F'  that  is  likely  to  contain  points 
with  negative  depth  values  shrinks  in  size.  In  the  limit  when  F'  coincides  with  F,  all 
depth  values  become  positive.  (When  the  product  (t  ■  z){ ('  /)  is  negative,  the  situation 
is  reversed.  In  this  case,  it  is  more  likely  that  the  points  i.e  vicinity  of  F F'  will  give 


rise  to  positive  depth  values  and  the  points  along  or  in  the  vicinity  of  FX  and  F'X'  will 
give  rise  to  negative  depth  values,  but  otherwise  similar  conclusions  can  be  drawn.) 

The  true  FOE  is  at  infinity  when  t  •  i  —  0.  First  consider  the  situation  where  t'  •  i  —  0 
for  a  spurious  solution.  Then  we  have 

s  •  t  —  -  t  ■  Er  and  s  ■  t/  =  -t'-Er. 

Using  these,  we  obtain 

(s  •  t)(s  •  t')  -  (t  •  Er)(t'  •  Er). 

The  half-planes  {  //H  ,  II  }  and  {//,+U  IF  }  are  now  defined  by  the  vector  t  and  t',  instead 
of  x  and  x'  for  the  case  t  •  z  /  0  (that  is,  we  need  to  replace  x  and  x'  by  t  and  t', 
respectively,  in  our  earlier  analysis).  Since  these  vectors  are  constants,  we  conclude  that 
0  (in  this  case,  this  becomes  the  angle  between  the  two  vectors  t  and  t')  is  the  same 
for  every  image  point.  If  the  distribution  of  brightness  gradient  vectors  is  rotationally 
symmetric  and  independent  of  the  image  position,  each  image  point  can  give  rise  to  a 
negative  depth  value  with  probability  equal  to  9/ tt.  We  conclude  that  the  depth  values 
will  be  negative  for  some  image  points  unless  t  —  t'.  Sirnil.o  oguments  can  be  made 
when  only  one  of  the  FOEs  lies  at  infinity. 
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4.2.3  Locating  the  Focus  of  Expansion  using  Gradient  Vectors 

It  is  somewhat  easier  to  locate  the  FOE  when  it  lies  within  the  field  of  view  than  when  it 
lies  outside.  We  first  compute  the  sign  of  the  depth  values  using  an  initial  estimate  of  the 
solution,  t',  in  the  brightness  change  constraint  equation.  We  then  determine  the  cluster 
of  negative  depth  values.  The  first  method  to  be  presented  here  uses  the  fact  that  the 
centroid  of  this  cluster  is  expected  to  lie  half-way  between  the  true  FOE  and  the  assumed 
i'OE.  That  is,  because  of  the  symmetry  of  the  probability  distribution,  we  have  for  the 
expected  position  of  the  centroid 


t  =  |(t  +  t#)- 

Then  the  position  of  the  FOE  can  be  estimated  using: 

t  =  2t  -  ?. 

This  estimate  will  be  biased  if  the  border  of  the  image  cuts  off  a  significant  portion  of  the 
cluster.  Nevertheless,  a  simple  iterative  scheme  can  be  based  on  the  above  approximation 
that  updates  the  estimate  as  follows: 

(t')n+1  =  2(t)n  -  (?)", 

where  (t)n  is  the  centroid  of  the  cluster  of  points  with  negative  depth  values  obtained 
using  the  estimate  (t')n  for  the  FOE.  The  cluster  will  shrink  at  each  iteration,  so  in 
subsequent  computations  we  may  restrict  attention  to  the  image  region  containing  the 
major  portion  of  the  previous  cluster  rather  than  the  whole  of  the  initial  image  region 
under  consideration. 

Other  methods  we  have  investigated  work  even  when  the  FOE  is  outside  the  field  of 
view.  Suppose  that  we  identify  at  least  two  FOE  constraint  lines  corresponding  to  two 
assumed  FOEs.  The  intersection  of  these  lines  will  be  the  estimated  FOE.  In  practice 
more  than  two  FOE  constraint  lines  are  used  to  reduce  the  effects  of  measurement  error. 
These  lines  will  no  longer  all  intersect  in  a  common  point  because  of  noise  in  the  images, 
quantization  error,  and  error  in  the  estimate  of  brightness  derivatives.  It  makes  sense 
then  to  choose  as  the  estimate  of  the  true  FOE  the  point  with  the  least,  sum  of  squares 
of  distances  from  the  constraint  lines. 

The  axis  of  symmetry  or  axis  of  least  inertia  of  the  clusters  of  positive  and  negative 
depth  values  for  a  particular  assumed  FOE  can  he  chosen  as  the  FOE  constraint,  line. 
Mter natively,  we  may  employ  a  direction  histogram  method.  In  this  case,  we  need  to 
del*  nrune  the  line  through  the  assumed  FOE  along  which  the  largest  number  of  negative 
d<  pt  h  values  are  found  on  one  side  of  the  assumed  FOE,  and  the  largest  number  of 
<>o-iti\  <■  dept  h  values  on  the  other  side  (Negahdaripour  1986!). 
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To  surrmari/e,  we  first  choose  arlu’iary  points  in  the  image  as  estimate*  *.f  ;hr  i  OK. 
For  each  assumed  FOK,  we  determine  !  he  signs  of  the  dept h  values  at  each  in  age  point 
using  the  '-rightness  change  equation,  end  then  the  FOK  us?  raim  line  win  ;  Pe;  .1 
clustering  technique  or  a  histogi,  m  <  od.  Finally,  we  11-.- -  the  best  e.stu.  ,t-  .. 

common  intersection  ot  the  constrain.  mis  corresponding  i  o  tin  assumed  1  (  i  ,•  -  'in 
best  estimate  of  the  FOK. 
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Suppose,  an  upper  bound  for  each  component  of  the  rotational  vector  u  is  available; 
for  example,  it  is  known  that  ju/,|  <  ia"iax.  If  each  interval  from  wtmax  to  oetmax  is  divided 
:;iio  i.  smaller  intervals,  we  can  restrict  the  search  to  the  n3  discrete  points  in  w-space. 
It  :  us  denote  a  point  in  this  space  by  u>XJk  for  t,  j,  k  =  1,  2  n.  For  each  possible 
n'  in  this  space  (that  is,  for  each  we  estimate  the  location  of  the  FOE  using 

tn  . i.ettmd  given  earlier.  We  store  the  value  of  the  error  e(u>t;jt)  for  the  best  FOE  in 
h  The  best  estimate  of  the  rotation  corresponds  to  a  minimum  of  the  error 

.  is,-:..  To  obtain  an  even  more  accurate  result  we  may  perform  a  local  search  in  the 

M  :gi,i lorhood  of  . 

*>  '•m  IocUh!  Examples 

i  ;  j n  examples,  we  show  that  it  is  possible  to  determine  the  location  of  the  true  FOE 
,  ,,  i  >i  •  distribution  of  the  clusters  of  positive  and  negati.c  depth  values  around  the 
.i.ut  il  FOE.  In  these  examples,  we  have  used  synthetic  data  so  that  the  underlying 
■  i  u  is  known  exactly.  The  focal  length  is  assumed  to  be  unity  and  the  image  plane  is 
1 1 : ■ , r  i.uare  divided  into  61  rows  of  64  picture  cells.  The  half-angle  of  the  field  of  view 
;  i: i,in  1  0.5  ss  2V .  The  positive  j-axis  points  towards  the  right  and  the  positive 
,  ,i  :  points  downward.  Positive  depth  values  and  the  spatial  brightness  derivatives 
v.iui  <  'in.-i'ii  randomly.  The  depth  values  vary  in  a  range  of  one  to  nine  units.  The 
i : *  derivative  h\  c  of  image  brightness  was  computed  using  the  brightness  change 
i  nil  ’  !  :  u i  nt  e<i nation, 

c  (v  •  w  *  ~(s  •  t)). 

1 1 ,  .'mui.tte  the  effect  of  noise,  random  noise  was  added  to  both  Kr  and  c  Ef. 

(>.:  Example  One:  Focus  of  Expansion  in  the  linage 

! !•  i (-Aiimple,  we  consider  an  observer  approaching  a  scene;  the  motion  parameters  are 
u:  (l)/  and  t  (0,0,  l)^;  that  is,  there  is  no  rotation  and  the  focus  of  expansion 

•i  i  enter  of  the  image  plane.  Figure  6  shows  the  regions  of  negative  (white)  and 
pi  >-i  i ;  v  <•  ;  idac  k)  depth  vai  ;cs  for  several  assumed  FOEs.  The  diagrams  in  columns  one 
ihn.'ign  four  show  the  results  when  the  added  noise  has  a  mean  of  about.  20%,  40%,  60%, 
si'V',' .  respectively.  These  plots  show  that  the  negative  and  positive  depth  values  form 
,  i ! ,  •  i  it  :  r'u  <  dusters  w  it  li  respect  to  the  line  from  t  he  assumed  FOE  to  the  true  FOE  l 'sing 
i  it  is  possible  to  estimate  the  location  of  the  true  FOE  with  good  accuracy 

i  ..  ,  v.  in  :  :  here  is  as  much  as  X()%  noise  in  the  data. 

ted  Ev.imple  Two:  Focus  of  Expansion  at  Infinity 

.  example.  (.1.0, 0) r  and  t  (0,  1 , 0) / ,  that  is  the  focus  of  expansion  is  at 

i  mg  i  he  negat  i vc  y  axis.  Here  the  rotational  coinp«i  ■  -  non-zero  but  assumed 
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figure  7  shows  the  regions  of  negative  (white)  and  positive  (black)  depth  values 
■ia;  assumed  FOEs  with  random  noise  added  to  the  brightness  derivatives.  The 
n  columns  one  through  four  show  the  results  when  the  added  noise  has  a  mean 
*  JUT.,  10%,  60%,  and  80%,  respectively.  We  can  determine  the  direction  toward 
■  i  OF  at  infinity  with  good  accuracy  with  as  much  as  10%  noise  in  the  data.  The 
de. criorate  to  some  extend  with  60%  noise  for  some  of  the  assumed  FOEs.  With 
■<  in  the  data,  it  is  hard  to  define  the  clusters  of  positive  and  negative  depth 
.  sm  the  FOE  cannot  be  located  accurately. 


unple  Three:  Unknown  Rotation 


;  v  a  r . 
\  1  ’ 
fii’pr 
I  i  i  -  ’  ; 

■ ;  i  l  \ 


example,  we  investigate  the  sensitivity  of  the  solution  to  local  variations  due  to 
■  i  :  i  rotational  parameters.  The  motion  is  toward  the  scene  with  no  rotation  (as  in 
i ir-  one)  so  that  the  true  FOE  is  at  the  origin  of  the  image  plane.  The  depth  values 
;ii  a  range  from  one  to  nine  units  with  an  average  of  about  five  units.  To  study  the 
odency  of  the  solution  on  the  choice  of  the  rotational  vector,  the  procedure  given  in 
mevious  examples  was  repeated  for  six  values  of  the  rotational  vector  with  40%  noise 
:o  data.  The  results  are  shown  in  Figure  8.  Again,  the  regions  with  negative  depth 
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n  in  white  and  the  regions  with  positive  depth  are  in  black. 
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die  first  column  shows  the  results  for  an  assumed  rotation  of  u1  (.05,0,0)'  .  These 
ds  show  that  the  estimated  FOE  is  located  on  the  positive  y-axis  around  y  -  0.25 
ihc  axis  of  symmetry  of  the  clusters  of  negative  depth  values  for  the  assumed 
.s  insects  the  y-axis  around  y  0.25).  Interestingly,  this  is  consistent  with  a 
slation  of  t'  —  (0,  .25,1)%  Therefore,  we  have  overestimated  the  rotation  about  the 
is  by  0.05  radians  and  the  translation  along  y-axis  by  about  0.25  units.  As  explained 
i  .'  Mil  noisy  data,  it  is  possible  to  interpret  a  rotation  about  the  positive  x-axis  as 
i-ed'i!  on  in  the  direction  of  the  negative  y-axis,  scaled  by  the  distance  of  the  object 
■  i  '  :<  w«  r  (note  that  the  average  of  depth  values  is  about  five  units).  In  this  case, 

•  t  .  add  a  translation  in  the  positive  y  direction  to  offset,  the  rotation  about  the 
i  •  e  r-a xis. 

.'if  i  ond  column  shows  the  results  for  an  assumed  rotation  of  u'  (  .05,0,0)  . 

,  .i  n.  the  estimated  F'OFJ  is  along  the  negative  y-axis  at  a  distance  of  about  0.25 

■  :  the  origin.  This  is  consistent,  with  a  translation  vector  \!  (0,-  0.25,  l)% 

<  oinlusion  as  in  the  previous  case  run  be  made,  we  need  to  add  a  translation 

1  i  ■  eat.'.e  y  direction  to  offset  the  rotation  about  the  negative  .r  axis. 

11  '  d  obitmi  shows  the  results  for  an  assumed  rotation  of  tv'  (0,0.05,0)%  In 
no  F  O E  constraint  lines  do  not,  seem  to  have  a  common  intersection  point.  This 
stn<  <•  the  conclusion  is  that,  t  he  assumed  rotation  Cannot  be  correct.  The 
'  "Mi  -  different  for  tj1  (0,  0.05, 0)*  ft.he  results  tire  shown  in  the  fourth  column). 

<  a  e,  trie  axes  of  symmetry  of  the  negative  depth  clusters  seem  to  intersect,  around 

■  i’  (0.25,0)%  This  is  consistent  wt;  raiislation  of  %  m->n  n 


(0.25,0,  1)% 
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Again,  with  noisy  data,  it  is  possible  to  interpret  a  rotation  about  the  negative  y  axis 
as  a  translation  in  the  negative  x  direction  scaled  by  the  distance  of  the  object  from  the 
viewer.  In  this  case,  we  need  to  add  a  translation  in  the  positive  x  direction  to  offset  :  he 
rotation  about  the  negative  y-a?;is. 

The  remaining  plots  from  the  leftmost  column  to  the  rightmost  column.  (Figure  tf, 
continued)  are  for  an  assumed  rotation  of  u>  --  (0, 0,  0.05)7',  w  (0,0,  O.Oo)7',  tu 

0),0.0.l)r,  and  u  -  (0,0,  -0.1)7,  respectively. 

A  careful  review  of  these  plots  reveals  that,  for  each  assumed  rotation,  the  FOK 
(••untraint  fines  do  not  intersect  at  a  common  point,  but  seem  to  intersect  in  p<  inis  lying 
on  a  i  irc’e  cornered  at  the  origin  With  radius  proportional  to  the  assumed  rotation  rate 
about  the  optical  axis.  To  explain  this,  we  need  to  remember  that  a  rotation  about 
t  lie  optical  axis  generates  motion  field  vectors  that  are  tangent  to  concentric  ci  xles  with 
renter  at  the  FO  K  (the  origin  in  this  case).  For  a  rotation  of  the  viewer  about  the  posiliv  e 
e-axis  (the  optica1  axis)  the  motion  field  vectors  travel  counterclockwise.  Conveiseiy,  they 
•re  clockwise  for  a  rotation  about  the  negative  z-axis.  Take  rotation  about  tha  positive 
axis,  for  example  (results  shown  in  the  first  and  third  columns).  Along  the  negative 
;/-uxis  (remember  -his  points  upward')  the  motion  field  vectors  point  from  right  to  left 
(and  increase  in  magnitude  linearly  with  y).  This  is  indicated  by  the  shift  in  the  negative 
depth  c  luster  toward  the  negative  x-direction  (second  row  in  the  first  and  third  columns). 
Along  the  positive  x  direction,  those  vectors  point  upward  (and  increase  in  magnitude 
iim-arly  with  x).  'This  appears  as  an  upward  shift  in  the  negative  depth  cluster  (third 
row  in  the  first,  arid  third  columns). 

The  same  behavior  is  observed  in  the  plots  in  the  last  two  rows  of  the  first  and  third 
<  ‘hum!.-  In  each  case,  t fie  negative  depth  cluster  is  shifted  somewhat  in  the  direction 
■  (insistent  with  a  rotation  about  a-axis.  This  implies,  as  mentioned  earlier,  that  the  axes 
of  vmmetrv  of  these  clusters  do  not  intersect  at  a  common  point  (the  origin)  because 
d  ‘did' s,  but  rather  intersect  at.  several  points  that  are  located  approximately  on  a 
'  :  i  .i  >>  ill:  center  at  the  origin  and  radius  proportional  to  the  magnitude  of  the  assumed 
ona'inn.  fie  p  ots  in  the  second  and  fourth  columns  for  a  rotation  about  the  negative 
-  -  O  a  simnar  behavior  except  that  the  shifts  are  now  in  the  opposite  directions, 

a  'c  we  expect  that  the  axes  of  symmetry  of  the  clusters  intersect  almost  at  a 
■  r •  i : 1 1« ,  point  when  tlm  magnitude  of  rotation  about  z- axis  tends  toward  z era;  that  is, 
->rr<  rt  rotation  is  assumed  (see  also  the  plots  given  in  the  first  example) 

7  S.)  rat  tin  ry 

1  ■  •  id'  pc. p*  r  'xi  have  shown  that  one  can  exploit  the  positiveness  of  depth  as  <;  •  hit 

orb  r  to  estimat  e  the  location  of  the  focus  of  expansion  when  the  motion  is  either  ,  .  A 
1  r.i  t  ioi  al  or  the  rotational  component  is  known.  The  approach  is  based  on  the  fact 
b  w  ‘in  an  arbitrary  point  in  the  image  is  chosen  as  the  FOK.  the  depth  values  that 
i :<  > 'o.p  i’t  d  billed  on  t  he  assuinc>cl  FOK  tend  to  form  clust<  ■  I  positive  and  negative 
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values  around  the  line  that  connects  the  assumed  FOE  to  the  true  FOE;  that  is,  the 
line  that  we  referred  to  as  the  FOE  constraint  line.  These  clusters  are  symmetrical  with 
respect  to  the  FOE  constraint  line  and  can  be  used  to  determine  the  direction  toward  the 
true  FOE;  that  is,  the  orientation  of  the  FOE  constraint  line.  By  finding  the  common 
intersection  of  several  such  constraint  lines,  it  is  possible  to  obtain  a  reasonable  estimate 
of  the  true  FOE.  In  two  selected  examples,  we  showed  that  when  the  rotation  is  known, 
the  method  we  suggested  can  give  a  good  estimate  of  the  location  of  the  FOE  in  the 
presence  of  noisy  data  (with  noise  of  as  much  as  60%). 

When  the  rotational  component  is  not  known  (and  is  non-zero),  these  constraint  lines 
do  not  have  a  common  intersection  point.  This  is  reminiscent  of  the  fact  that  motion 
field  vectors  do  not  intersect  at  a  common  point  when  the  viewer  rotates  about  some 
axis  through  the  viewing  point  as  well  as  translating  in  an  arbitrary  direction.  In  this 
case,  we  proposed  a  method  based  on  discounting  the  component  due  to  rotation  (by 
assuming  some  arbitrary  rotation)  before  we  apply  the  method  developed  for  the  case  of 
pure  translation.  Ideally,  a  reasonable  estimate  of  the  FOE  is  obtained  only  when  the 
correct  rotation  is  assumed;  this  corresponds  to  a  distinct  optimum  solution.  We  have  not 
implemented  the  method  to  evaluate  the  accuracy  of  the  solution;  however,  we  presented 
an  example  to  demonstrate  the  behavior  of  the  solution,  with  noisy  data,  where  the 
rotation  vector  was  varied  locally.  The  results  showed  some  of  the  difficulties  we  have  to 
deal  with  in  estimating  3-D  motion  when  the  rotational  component  of  motion  is  unknown. 
For  example,  several  interpretations  were  possible  (based  on  a  qualitative  analysis)  related 
to  the  ambiguity  in  distinguishing  rotation  from  translation  (appropriately  scaled  by 
the  average  distance  of  the  viewer  from  the  scene).  These  interpretations,  however,  are 
consistent  with  those  obtained  from  the  corresponding  noisy  two-dimensional  optical  (low 
estimate  by  other  means. 
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8  Appendix — Image  Plane  Formulae  for  the  FOE 


Sonic  of  the  results  presented  above  have  been  expressed  concisely  using  vector  notation 
It  is  occasionally  helpful  to  develop  corresponding  results  in  terms  of  the  components 
of  these  vectors.  Consider,  for  example,  the  methods  for  recovering  the  FOK  from  the 
brightness  gradient  at  stationary  points  (where  c  0).  Let  the  FOK  be  at  t  (xr.,yo,  \)T . 
At  a  stationary  point,  s  t  -  0,  and  so  s  •  t  -  0  (unless  t  i  0).  This  in  turn  can  be 
expanded  to  yield 

x, , Ex  -  y  Ey  xEz  ■  yh\ 

that  is,  the  brightness  gradient  is  perpendicular  to  the  line  from  the  stationary  point  to 
the  FOK. 

Now  suppose  that  we  have  the  brightness  gradient  at  two  stationary  points,  ( Nr i ,  y i ) 
and  (x:..y2)  say.  Then 


X.  I  Ez , 

■  y  /;y. 

Xi  Ez  •  y i 
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•  y 
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t'V  t  • 

which  gives  us 
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K/J  Ky,  ) 

(■*■1  t'l.  ■ 
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*  V'l  Ey^  )  Ez 

This  in  turn  yields  the  location  of  the  FOK,  ( x  ,  y,  ).  provided  that  the  brightness  gradients 
at  the  two  stationary  points  are  not  parallel.  This  result  corresponds  exactly  to 
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(*i  ■  sd  •  * 

N’<-xt,  consider  the  case  were  many  stationary  points  are  known.  Suppose  there  are  n 
.-urn  points.  Then  we  may  wish  to  minimize 
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